A Memory-based Robust region feature synthesizer for zero-shot object detection

👤 Peiliang Huang
📅 Last updated on May 23, 2024
CVPR
M-RRFS Framework

Abstract

With the goal to detect both the object categories appearing in the training phase and those never have been observed before testing, zero-shot object detection (ZSD) becomes a challenging yet anticipated task in the community. Current approaches tackle this problem by drawing on the feature synthesis techniques used in the zero-shot image classification (ZSC) task without delving into the inherent problems of ZSD.

In this paper, we analyze the outstanding challenges that ZSD presents compared with ZSC—severe intra-class variation, complex category co-occurrence, open test scenario, and reveal their interference to the region feature synthesis process.

Methodology

In view of this, we propose a novel memory-based robust region feature synthesizer (M-RRFS) for ZSD, which is equipped with the following mechanisms:

1. Intra-class Semantic Diverging (IntraSD): To overcome the inadequate intra-class diversity problem.

2. Inter-class Structure Preserving (InterSP): To address the insufficient inter-class separability issue.

3. Cross-Domain Contrast Enhancing (CrossCE): To solve the weak inter-domain contrast problems.

Moreover, when designing the whole learning framework, we develop an asynchronous memory container (AMC) to explore the cross-domain relationship between the seen class domain and unseen class domain to reduce the overlap between the distributions of them. Based on AMC, a memory-assisted ZSD inference process is also proposed to further boost the prediction accuracy.

Results

To evaluate the proposed approach, comprehensive experiments on MS-COCO, PASCAL VOC, ILSVRC and DIOR datasets are conducted, and superior performances have been achieved.

Notably, we achieve new state-of-the-art performances on MS-COCO dataset:

64.0% Recall@100 with IoU = 0.4
60.9% Recall@100 with IoU = 0.5
55.5% Recall@100 with IoU = 0.6
15.1% mAP with IoU = 0.5

Under the 48/17 category split setting. Meanwhile, experiments on the DIOR dataset actually build the earliest benchmark for evaluating zero-shot object detection performance on remote sensing images.